The Perils of Classifying Political Orientation From Text

نویسندگان

  • Hao Yan
  • Allen Lavoie
  • Sanmay Das
چکیده

Political communication often takes complex linguistic forms. Understanding political ideology from text is an important methodological task in studying political interactions between people in both new and traditional media. Therefore, there has been a spate of recent research that either relies on, or proposes new methodology for, the classification of political ideology from text data. In this paper, we study the effectiveness of these techniques for classifying ideology in the context of US politics. We construct three different datasets of conservative and liberal English texts from (1) the congressional record, (2) prominent conservative and liberal media websites, and (3) conservative and liberal wikis, and apply text classification algorithms with a domain adaptation technique. Our results are surprisingly negative. We find that the cross-domain learning performance, benchmarking the ability to generalize from one of these datasets to another, is poor, even though the algorithms perform very well in within-dataset cross-validation tests. We provide evidence that the poor performance is due to differences in the concepts that generate the true labels across datasets, rather than to a failure of domain adaptation methods. Our results suggest the need for extreme caution in interpreting the results of machine learning methodologies for classification of political text across domains. The one exception to our strongly negative results is that the classification methods show some ability to generalize from the congressional record to media websites. We show that this is likely because of the temporal movement of the use of specific phrases from politicians to the media.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Perils of Universal and Product-Led Thinking; Comment on “How Neoliberalism Is Shaping the Supply of Unhealthy Commodities and What This Means for NCD Prevention”

Lencucha and Thow’s paper offers an important addition and corrective to the burgeoning body of work in public health on the ‘commercial determinants of health’ in the context of non-communicable diseases (NCDs). Rather than tracing the origins of incoherence across policy sectors to the nefarious actions of industry, they argue that we need to be better attuned to the ...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Political Leaning Categorization by Exploring Subjectivities in Political Blogs

This paper addresses a relatively new text categorization problem: classifying a political blog as either ‘liberal’ or ‘conservative’, based on its political leaning. Instead of simply using “Bag of Words” features (BoW) as in previous work, we have explored subjectivity manifested in blogs and used subjectivity information thus found to help build political leaning classifiers. Specifically, o...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Critical Discourse Analysis of the Political Novel, “Unsecured Existence”

Political novel is one of the kinds of Persian Literature with special factors, which are different from other contemporary story writing styles.  These types of stories are more accommodated with critical discourse analysis (CDA) among other methods of novel analyses compared to other types, because of their specificity and unique quality of and their close relations with society and political...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017